On Compound Poisson Approximation For Sequence Matching
نویسنده
چکیده
Consider the sequences fX i g m i=1 and fY i g n j=1 of independent random variables , which take values in a nite alphabet, and assume that the variables X 1 ; X 2 ; : : : and Y 1 ; Y 2 ; : : : follow the distributions and , respectively. Two variables X i and Y j are said to match if X i = Y j. Let the number of matching subsequences of length k between the two sequences, when r, 0 r < k, mismatches are allowed, be denoted by W. In this paper we use Stein's method to bound the total variation distance between the distribution of W and a suitably chosen compound Poisson distribution. To derive rates of convergence, the case where EW] stays bounded away from innnity, and the case where EW] ! 1 as m; n ! 1, have to be treated separately. Under the assumption that ln n= ln(mn) ! 2 (0; 1), we give conditions on the rate at which k ! 1, and on the distributions and , for which the variation distance tends to zero.
منابع مشابه
Compound Poisson approximation: a user’s guide
Compound Poisson approximation is a useful tool in a variety of applications, including insurance mathematics, reliability theory, and molecular sequence analysis. In this paper, we review the ways in which Stein's method can currently be used to derive bounds on the error in such approximations. The theoretical basis for the construction of error bounds is systematically discussed, and a numbe...
متن کاملOn the bounds in Poisson approximation for independent geometric distributed random variables
The main purpose of this note is to establish some bounds in Poisson approximation for row-wise arrays of independent geometric distributed random variables using the operator method. Some results related to random sums of independent geometric distributed random variables are also investigated.
متن کاملA Compound Poisson Approximation Inequality
We give conditions under which the number of events which occur in a sequence of m-dependent events is stochastically smaller than a suitably defined compound Poisson random variable. The results are applied to counts of sequence pattern appearances and to system reliability. We also provide a numerical example.
متن کاملOn Runs in Independent Sequences
Given an i.i.d. sequence of n letters from a finite alphabet, we consider the length of the longest run of any letter. In the equiprobable case, results for this run turn out to be closely related to the well-known results for the longest run of a given letter. For coin-tossing, tail probabilities are compared for both kinds of runs via Poisson approximation.
متن کاملNormal and Compound Poisson Approximations for Pattern Occurrences in NGS Reads
Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Combinatorics, Probability & Computing
دوره 9 شماره
صفحات -
تاریخ انتشار 2000